Random variable: A map from a probability space to a measurable space

\( \newcommand{\water}{{\rm H_{2}O}} \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\E}{\mathbb{E}} \newcommand{\d}{\mathop{}\!\mathrm{d}} \newcommand{\grad}{\nabla} \newcommand{\T}{^\text{T}} \newcommand{\mathbbone}{\unicode{x1D7D9}} \renewcommand{\:}{\enspace} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator{\Tr}{Tr} \newcommand{\norm}[1]{\lVert #1\rVert} \newcommand{\KL}[2]{ \text{KL}\left(\left.\rule{0pt}{10pt} #1 \; \right\| \; #2 \right) } \newcommand{\slashfrac}[2]{\left.#1\middle/#2\right.} \)

Suppose we want to model a random system. Let \(\; (\Omega, \mathcal{F}, P) \;\) be a probability space and \(\; (E, \mathcal{E}) \;\) be a measurable space.

The probability space models the inherent randomness of the system, and the outcomes \(\; w \in \Omega \;\) are the latent outcomes or states that the system can take randomly (which may or may not be observable). When the system adopts a latent outcome \(\; w \;\), we can think of it as "Nature" or "God" having chosen that outcome.
In contrast, the measurable space models what we observe. Each \(\; e \in E \;\) is an observable outcome that occurs because a corresponding latent outcome \(\; w \;\) has happened.

A random variable is the missing piece: it is the function that relates each latent outcome \(\; w \in \Omega \;\) to its corresponding visible outcome \(\; e \in E \;\)

Let \(\; (\Omega, \mathcal{F}, P) \;\) be a probability space and \(\; (E, \mathcal{E}) \;\) be a measurable space. Then, a \(\; (E, \mathcal{E})\)-valued random variable is a measurable function \(\; X: \Omega \to E \;\).

(Remember: measurable function means that there is a correspondence between events in the target sigma algebra \(\; \mathcal{E} \;\) and the origin sigma algebra \(\; \mathcal{F} \;\).)

This definition enables us to assign a probability to every subset in the target space \(\; B \in \mathcal{E} \;\) by looking at its preimage on the original space (domain), which is measurable by assumption.

This is useful because, while we could avoid this duplicity by defining a new probability space for each random system, it allows us to define many random variables from the same underlying random process, which happens often in real life. For example, if the underlying random process is throwing a die, and which side the die falls on, you could define all the following random variables from it:

The actual number that you roll: in this case the random variable describes the underlying system itself. Probability space: \(\; (\Omega, \mathcal{F}, P)\). Measurable space \(\; (\Omega, \mathcal{F})\). Random variable \(\; X: \Omega \to \Omega \;\)
Whether the number rolled is even or odd: this could be defined as another random variable, although it would be more intuitive to define the events "even" and "odd" as unions of the outcomes 2, 4, 6 and 1, 3, 5 respectively, in the same probability spaces and random variable as before.
The surface painted in black in the side of the die that has been rolled: this may not be exactly the same as the number rolled up to a constant, since dots may not be the same size on all faces. For instance, the "one" face may have a larger, single dot.

NOTE You are always allowed to define the same target (observable) measurable space as the origin (underlying) measurable space. In this case, the origin sample space would be the same as the target state space, and the random variable would just be the identity.

NOTE The above point leads to many people making simplifications without realizing when thinking about random variables.

1 https://en.wikipedia.org/wiki/Random_variable#Measure-theoretic_definition

Random variable: A map from a probability space to a measurable space

Gist

Formal definition

Why this dual definition, with two measurable spaces?

References